Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve docs on some char boolean methods #61794

Closed
wants to merge 5 commits into from
Closed

Conversation

spl
Copy link
Contributor

@spl spl commented Jun 13, 2019

This is an attempt to improve some of the documentation on char methods that reference Unicode aspects. My goals are:

  • Make the docs more uniform and precise in wording
  • Link to the relative Unicode documentation for reference
  • Use the latest https://www.unicode.org links to always point to the latest standard and documentation rather than a specific version

I believe the uniformity and wording is improved, but perhaps not everyone agrees. You will find, for example, that I replace “is an alphabetic code point” with “has the Alphabetic property” and give a reference for the Alphabetic property. Since the method is named is_alphabetic(), it is slightly redundant to say that the meaning is “is alphabetic,” and it also doesn't provide more information, whereas, the Unicode documentation does. However, you may prefer, in addition to the reference, something like a copy of the Unicode documentation, e.g. “The Alphabetic property is a derived informative property of the primary units of alphabets and/or syllabaries, whether combining or noncombining.” I'm not yet convinced that adding this is useful because it tends to involve a lot of unfamiliar vocabulary for the uninitiated. Perhaps some middle ground could be found here, but I'm not sure what that looks like at the moment.

I link to the latest Unicode documentation since that seems easiest to maintain. However, it is not necessarily the standard implemented. Right now, unicode::tables::UNICODE_VERSION is 11.0.0, but the latest standard is 12.1.0. But if you want to link to the current version implemented, then those documented links should be updated every time the version is updated. Unless there is some automated mechanism for handling the version update, I think it's an acceptable minor error to link to the latest version.

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @dtolnay (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jun 13, 2019
@rust-highfive
Copy link
Collaborator

The job x86_64-gnu-llvm-6.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
travis_time:end:0a966650:start=1560416114330522296,finish=1560416117104220122,duration=2773697826
$ git checkout -qf FETCH_HEAD
travis_fold:end:git.checkout

Encrypted environment variables have been removed for security reasons.
See https://docs.travis-ci.com/user/pull-requests/#pull-requests-and-security-restrictions
$ export SCCACHE_BUCKET=rust-lang-ci-sccache2
$ export SCCACHE_REGION=us-west-1
$ export GCP_CACHE_BUCKET=rust-lang-ci-cache
$ export AWS_ACCESS_KEY_ID=AKIA46X5W6CZEJZ6XT55
---

[00:04:20] travis_fold:start:tidy
travis_time:start:tidy
tidy check
[00:04:20] tidy error: /checkout/src/libcore/char/methods.rs:736: line longer than 100 chars
[00:04:25] some tidy checks failed
[00:04:25] 
[00:04:25] 
[00:04:25] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/tidy" "/checkout/src" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "--no-vendor" "--quiet"
[00:04:25] 
[00:04:25] 
[00:04:25] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test src/tools/tidy
[00:04:25] Build completed unsuccessfully in 0:01:13
---
travis_time:end:01dc03e2:start=1560416394010650340,finish=1560416394015418584,duration=4768244
travis_fold:end:after_failure.3
travis_fold:start:after_failure.4
travis_time:start:03fd0ff2
$ ln -s . checkout && for CORE in obj/cores/core.*; do EXE=$(echo $CORE | sed 's|obj/cores/core\.[0-9]*\.!checkout!\(.*\)|\1|;y|!|/|'); if [ -f "$EXE" ]; then printf travis_fold":start:crashlog\n\033[31;1m%s\033[0m\n" "$CORE"; gdb --batch -q -c "$CORE" "$EXE" -iex 'set auto-load off' -iex 'dir src/' -iex 'set sysroot .' -ex bt -ex q; echo travis_fold":"end:crashlog; fi; done || true
travis_fold:end:after_failure.4
travis_fold:start:after_failure.5
travis_time:start:098bdb58
travis_time:start:098bdb58
$ cat ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers || true
cat: ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers: No such file or directory
travis_fold:end:after_failure.5
travis_fold:start:after_failure.6
travis_time:start:00d0308c
$ dmesg | grep -i kill

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@Centril
Copy link
Contributor

Centril commented Jun 13, 2019

cc @Manishearth @SimonSapin

Copy link
Member

@dtolnay dtolnay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I agree that documenting char::is_lowercase as "Returns true if this char is lowercase" is silly.

I think this change is mostly fine but I would like to try to improve clarity of the all-important first sentence on each method. First sentences are supposed to stand alone to the extent that if you only read the first sentence of each method you still get a reasonable understanding of the point of the method. Documentation writers commonly get this wrong by having the first sentence summarize the rest of the documentation of the method; if the summary can only make sense after reading and understanding the rest of the documentation then the first sentence fails to stand alone.

My feeling is that many of these new first sentences only make sense if the reader already understands the point of the method.

For example you have:

Returns true if this char has the White_Space property.

This sentence means nothing until after the reader learns what "White_Space property" refers to by reading the rest of the documentation.

As I see it there are 2 key things that a first sentence would need to communicate to characterize this method:

  1. You (as the reader) know roughly what is meant by the word "whitespace". This method involves that type of whitespace as you would expect; it isn't some jargon.
  2. But the details are complicated and have something to do with Unicode.

I think the sentence as written isn't effective at communicating either of these things. We can try to brainstorm some alternatives; I feel that something more in this direction would work better:

Returns true if this is a whitespace character as classified by Unicode's White_Space property.

Another dimension to keep in mind about first sentences is that they ideally communicate the point of the method to the intended target audience for the method 1 in terms that makes sense for that audience. The set of people who will want to call is_whitespace() is different and has different background knowledge compared to the set of people who will need to call something like is_grapheme_extended(), so there will tend to be differences in the character of the documentation as a result.

1 Occasionally we extend this to include also the set of people who would be most likely mislead into thinking that they want to call a method they shouldn't, i.e. "this is probably not what you want"-style documentation, but this is uncommon.

@spl
Copy link
Contributor Author

spl commented Jun 14, 2019

@dtolnay Thank you for the enthusiastic response. I'm more than happy to brainstorm alternative approaches.

I generally agree with your summary of the goals of the first sentence.

I concede that I may have erred more on the side of brevity than necessary. I left out “Unicode” since I think it's understood that char is a code point, and I didn't want to be too redundant. But I don't feel strongly against putting it back in.

Considering your specific example:

Returns true if this is a whitespace character as classified by Unicode's White_Space property.

I like the usage of “classified” better than the existing “satisfies” or my “has.” I would amend it slightly to:

Returns true if this is a whitespace character as classified by the Unicode White_Space property.

I think it's better to avoid the possessive form when it's not necessary, and “the Unicode ... property” helps to imply that there's only one and that it's standardized.

Another option would be to re-emphasize the code point aspect:

Returns true if this is a whitespace code point as classified by the Unicode White_Space property.

I feel that char as used in the documentation serves as a meaningful but familiar substitute for “code point,” whereas “character” might contribute to confusion. But this may be a minor issue.

Also, you left out “this char”. Was that intentional? I don't feel strongly either way, but I think there should be some consistency across the board for how function parameters are referenced in the documentation. I'm still somewhat new to Rust, so I don't know what that convention is or should be.

Copy link
Member

@dtolnay dtolnay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, “the Unicode ... property” works for me. "Code point" instead of "character" is fine too, will leave that up to you. I would lean toward "character" because the type's documentation introduces it as "The char type represents a single character" and talks about "code point" only to clarify what character means on a technical level. I don't think it would be a benefit in clarity to make that distinction again in all of these methods.

I think Unicode is important to mention by name in some form. I would not expect readers to deduce that "the White_Space property" by itself refers to some Unicode thing. Searching for White_Space property brings up information about CSS white-space before anything about Unicode.

“this char” is intentionally omitted. In documentation I typically recommend sticking to English words instead of code blocks where English is sufficiently concise and unambiguous, as in "returns true if ..." rather than "returns true if ...". Code blocks are formatted deliberately to disrupt the reader's flow with a context switch and are only worth using if that is called for.

@dtolnay dtolnay added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 22, 2019
@JohnCSimon
Copy link
Member

Ping from triage
@spl can you please address the changes @dtolnay requested? Thank you.

@spl
Copy link
Contributor Author

spl commented Jul 21, 2019

@JohanLorenzo Yes. Apologies for the delay. I got caught up with something else, and now I'm away from the computer for a week. I'll get to it next week.

@JohanLorenzo
Copy link
Contributor

Hey there 🙂 Thanks for the ping. Although, you probably meant to at-mention @JohnCSimon 😃

@spl
Copy link
Contributor Author

spl commented Jul 22, 2019

@JohanLorenzo Oops! Sorry! I didn't pay close enough attention to GitHub's completion. It usually works so well.

@bors
Copy link
Contributor

bors commented Jul 23, 2019

☔ The latest upstream changes (presumably #62902) made this pull request unmergeable. Please resolve the merge conflicts.

@JohnCSimon
Copy link
Member

@spl Can you please address the merge issues? This is so close! Thanks.

@JohnCSimon
Copy link
Member

@spl Ping again from triage ...

@JohnCSimon JohnCSimon added the S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. label Aug 17, 2019
@JohnCSimon
Copy link
Member

JohnCSimon commented Aug 17, 2019

hello from triage
@spl @dtolnay
Unfortunately, I'm going to have to close this PR as it has sat idle for far too long, our guidelines are

If the author has not responded to a previous ping, meaning more than 2 weeks have passed with no 
activity, the PR should be closed with a message thanking the author for their work, asking the them 
to reopen when they have a chance to make the necessary changes, and warning them not to push to 
the PR while it is closed as that prevents it from being reopened. Tag the PR with S-inactive-closed.

@JohnCSimon JohnCSimon closed this Aug 17, 2019
Centril added a commit to Centril/rust that referenced this pull request Oct 13, 2019
Improve docs on some char boolean methods

simple revival of rust-lang#61794
(also rustfmt on rest of file :)

Documentation for `is_xid_start()` and `is_xid_continue()` couldn't be improved since both methods got remove from this repository

r? @dtolnay
cc @JohnCSimon
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants